[% setvar title Split Scalars and Objects/References into Two Types %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p>Split Scalars and Objects/References into Two Types</p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Nathan Wiger &lt;<a href='mailto:nate@wiger.org'>nate@wiger.org</a>&gt;
  Date: 24 Aug 2000
  Last Modified: 25 Aug 2000
  Mailing List: <a href='mailto:perl6-language@perl.org'>perl6-language@perl.org</a>
  Number: 147
  Version: 2
  Status: Retracted</pre>
<a name='STATUS'></a><h1>STATUS</h1>
<p>Sometimes, you start down a train of thought, and you wind up with
something that you think is a beautiful idea.</p>
<p>But later on it turns out that, in reality, it sucks. Bad. REALLY Bad.</p>
<p>This is one of those ideas. There's so many problems with it I don't
know where to begin. It has some interesting features, but not enough to
destroy Perl for.</p>
<p>Anyways, there's much better solutions to the issues. Check out the
discussions on -objects about RFCs 159 and 161 in particular. Feel free
to read this RFC if you want to become violently ill.</p>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>Many RFC's have been proposed dealing with Perl's current prefixing
system. Many have criticized it, even suggesting to drop prefixes
altogether, but the vast majority of these criticisms center around one
thing: Not being able to tell the difference between &quot;true&quot; scalars and
references/objects.</p>
<p>The answer to this has frequently been that the $scalar type defines not
a quality, but a quantity: one.  While true in one sense, Perl does have
both @arrays and %hashes, notations which do the opposite: define a
quality.  While they are both composed of multiple things, those things
are fundamentally different, and hence the need for two notations.</p>
<p>References and true scalars (&quot;scalars&quot; henceforth) are also
fundamentally different things. Therefore, this RFC proposes that they
be split into two separate types in Perl 6: $scalars and *references.</p>
<p>Please take the time to read this. It is a thorough document that
explores both the pros AND cons of this approach.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>Please note that the * is used as the reference prefix in this
document.  Nonetheless, this proposal has NOTHING to do with typeglobs,
and using the * prefix would require typeglobs to disappear. The prefix
is debatable, let's focus on the idea.</p>
<a name='The Problem'></a><h2>The Problem</h2>
<p>Everyone on this list should be familiar with the problem.  You can't
tell scalars and references apart by looking at them. They are
completely ambiguous. Consider:</p>
<pre>   $stuff{key}      # hash value
   $stuff[0]        # array value
   $stuff           # scalar or reference?

   %stuff           # hash
   @stuff           # array
   $stuff           # scalar or reference?</pre>
<p>You get the idea. In simple examples, this ambiguity goes away because
of the way you address them:</p>
<pre>   $stuff           # scalar or reference?
   $stuff-&gt;{key}    # hash reference
   $stuff-&gt;[0]      # array reference
   $stuff-&gt;func     # object reference</pre>
<p>Once we use them, we recognize that the last three are references.
However, there are many situations when we can't disambiguate them:</p>
<pre>   $stuff = Class-&gt;method    # scalar or reference?
   read_data($stuff)         # scalar or reference?
   $$stuff                   # scalar or reference?</pre>
<p>Good variable names help, but the fact remains that these are
fundamentally different types. For example, you know what will happen
when you do this:</p>
<pre>   print @stuff              # the contents of @stuff   
   while(($k,$v) = each %h)) # each key/value of hash %h</pre>
<p>But what about these examples?</p>
<pre>   $z = $x + $y              # numeric addition?
   $stuff ||= &quot;Bob&quot;          # is this correct?
   print ref $stuff          # sure about that?</pre>
<p>Anyone who has worked on a large project in Perl will relate to the fact
that even this code block is difficult to read unless you know exactly
what the functions are supposed to return:</p>
<pre>   $c = Class-&gt;new;              # ok (maybe) so far...
   ($u, $g) = $c-&gt;get_user;
   ($d, $f, $j, $m) = $c-&gt;read_data;

   # ... lots of code passes ...

   print &quot;Thanks $u for your &quot;, $g-&gt;purchase;
   $member = $d || $CLASS_DEFAULT;      # scalars or ????
   while (($k,$v) = each %$f) {
       print &quot;Now updating info for $k...\n&quot;;
       push @$j, @{ $v-&gt;{'personal'} };
       $name = $member-&gt;name($$v-&gt;{name}-&gt;{first}) || &quot;unknown&quot;;
   } 
   print &quot;Your confirmation number is $m\n&quot;;</pre>
<p>Here, we've lost the advantage of prefixes. While there are lots of $'s
in the code, they don't tell us anything useful other than &quot;there's a
single something here&quot;. [1]</p>
<p>However, note that the @ and % still do, because they indicate quality
in addition to quantity.</p>
<a name='The Proposed Solution'></a><h2>The Proposed Solution</h2>
<p>This RFC proposes that references and scalars be split into two distinct
types. This is not a novel approach.  It is also not a simple one. There
are many issues to address.</p>
<p>However, the dividing line is clear:</p>
<pre>   Type              Definition
   ----------------- ----------------------------------
   $scalar           Holds a single simple value, such
                     as a string or number

   *reference        Holds a single complex value, such
                     as a hash reference or object</pre>
<p>Like %hashes versus @arrays, these two new types have the advantage that
they can describe both a quantity and a quality.  RFC 23, &quot;Higher order
functions&quot; adds a new variable prefix, ^, because it introduces a
fundamental new type: the placeholder.  The same idea should apply here
as well.</p>
<p>Consider the above sample code now:</p>
<pre>   *c = Class-&gt;new; 
   ($u, *g) = *c-&gt;get_user;
   (*d, *f, *j, $m) = *c-&gt;read_data;

   # ... lots of code passes ...

   print &quot;Thanks $u for your &quot;, *g-&gt;purchase;
   *member = *d || *CLASS_DEFAULT;     # oh, they're refs!
   while (($k,*v) = each %*f) {
       print &quot;Now updating info for $k...\n&quot;;
       push @*j, @{$*v{personal}};
       $name = *member-&gt;name($*v{name}{first}) || &quot;unknown&quot;;
   } 
   print &quot;Your confirmation number is $m\n&quot;;</pre>
<p>We now have a much better idea of what's going on. For one thing, we can
instantly tell the difference between our $scalar assignments and our
*reference assignments.  We can now be 100% sure that $name really holds
a scalar, and *c really holds a reference of some type. [2]</p>
<p>Moreover, with the complete syntax described below, we can actually tell
our objects and references apart more easily than with the current &quot;all
refs use -&gt;&quot; approach.</p>
<a name='Syntax'></a><h2>Syntax</h2>
<p>Basically, all references and objects will be prefixed with a *. The
single * will take the place of the current $-&gt; combination syntax
required for references. All scalars
will continue to be prefixed with a $.</p>
<p>This table should help illustrate the new syntax:</p>
<pre>  Perl 5                   Perl 6
  ------------------------ --------------------------
  $x = 5;                  $x = 5;
  $name = &quot;Nate&quot;;          $name = &quot;Nate&quot;;

  $_ = 1;                  $_ = 1;
  $_-&gt;func;                *_-&gt;func;     # new *_ anon ref

  # References
  $n = \$name;             *n = $name;   # \ redundant
  print $$n;               print $*n; 

  $hash{key} = $val;       $hash{key} = $val;
  $hash = \%hash;          *hash = %hash;
  $hash-&gt;{key}             $*hash{key}   # -&gt; redundant
  
  $array[3] = &quot;Nate&quot;;     $array[3] = &quot;Nate&quot;;
  $aref = \@array;        *aref = @array;
  $aref-&gt;[0]              $*aref[0]    # -&gt; redundant
  $aref-&gt;[0][1]           $*aref[0][1]
  @{$aref-&gt;[0][1]}        @{*aref[0][1]}

  # Objects -- keep -&gt; to make objects stand out
  $c = Class-&gt;func;        *c = Class-&gt;func;
  [no equivalent]          $c = Class-&gt;func;
  $r = $c || $DEFAULT;     *r = *c || *DEFAULT;
  print $c-&gt;name;          print *c-&gt;name;   # $*c-&gt;name implied
  print @{$c-&gt;name};       print @{*c-&gt;name};

  # Combinations
  $val = $c-&gt;func || $D-&gt;{VAL};
                           $val = *c-&gt;func || $*D{VAL};

  $name = $h-&gt;{name} || $a-&gt;[3][1];
                           *name = *h{name} || *a[3][1]</pre>
<p>Notice this buys us another key advantage. Since the * delimits
references, we no longer are forced to use -&gt; in all references. As
such, -&gt; can now be reserved solely for objects, allowing us to easily
locate our objects now.</p>
<p>The exact syntax is still open for discussion. There are many more
clarifying examples below as well.</p>
<a name='The Advantages'></a><h2>The Advantages</h2>
<p>There are many advantages to this approach. There are also potential
problems, which will be addressed second.</p>
<a name='Ability to differentiate types'></a><h3>Ability to differentiate types</h3>
<p>As mentioned above, this is the key advantage to this approach. Not only
is the code clearer, but we can now actually tell what we want from a
function without complex casting or syntax rules:</p>
<pre>   *object  =  date;     # object with accessor functions
   $scalar  =  date;     # scalar ctime date string</pre>
<p>Normally, the date() function would either have to choose what to
return, or would require specific casting in order to return the correct
thing. And, once it's returned, later in the code you still can't easily
tell what you got. Differentiating between *references and $scalars
allows us to do both.</p>
<p>Plus, as the table above shows, many references (such as
multidimensional array references) become much easier to read.</p>
<a name='Integration with want() to return refs'></a><h3>Integration with want() to return refs</h3>
<p>Currently, the new want() proposed by RFC 21 relies on odd syntax to
differentiate between hashrefs and arrayrefs:</p>
<pre>   $arrayref = @{func()};
   $hashref  = %{func()};</pre>
<p>While this works for simple situations, it requires some contortions in
complex situations:</p>
<pre>   $hashref  = %{func() || $def || $r-&gt;getdata()};</pre>
<p>By differentiating our references with this new syntax, we can move this
to the left side of the assignment, which is where it belongs:</p>
<pre>   @*arrayref = func();
   %*hashref  = func();
   %*hashref  = func() || *def || *r-&gt;getdata();</pre>
<p>The code is cleaner and easier to read, and can more more easily span
multiple lines. [4] When integrated with the previous point, we can now
easily tell apart objects and refs even in assignment contexts.</p>
<a name='Leftmost $ tells true context'></a><h3>Leftmost $ tells true context</h3>
<p>Currently, if the leftmost prefix is a $, the actual context is still
ambiguous:</p>
<pre>   print $$$$name;    # ref or scalar?
   $d =  $$$$name;    # ref or scalar?</pre>
<p>However, with our new syntax, $ is only used to delimit an actualy
scalar. So:</p>
<pre>   print $***name;    # scalar
   *d =  ****name;    # ref</pre>
<p>This is also extended to arrays and hashes:</p>
<pre>   @***name     # array
   %***name     # hash</pre>
<p>Note that this syntax is less confusing than this:</p>
<pre>   @$$$name
   %$$$name</pre>
<p>Where you have to <i>infer</i> that $$$name is a ref. The *, by contrast,
tells us this explicitly.</p>
<a name='No Left/Right cross-scanning to determine type'></a><h3>No Left/Right cross-scanning to determine type</h3>
<p>Currently, if you're reading a complex type in Perl you have to scan
left to right to figure out it's a ref:</p>
<pre>   @{ $r-&gt;{$key} }</pre>
<p>You first see the @, which tells you it's an array. However, the next
thing you see is a $, which leads you to believe it's a scalar, until
you see the -&gt;. In this simple isolated example, it's not too bad.  But
bury this in a chunk of code, and it's very hard to read and use.</p>
<p>Contrast this with the new syntax:</p>
<pre>   @{*r{$key}}</pre>
<p>Not only is this more compact, but by simply reading left to right you
can determine it's an array of a ref to a hash.</p>
<a name='Implicit type checking'></a><h3>Implicit type checking</h3>
<p>Currently, you can get to an individual element of an array or hash with
the syntax:</p>
<pre>   $array[0]       # first scalar or ref of @array
   $hash{key}      # scalar or ref indexed by &quot;key&quot;</pre>
<p>Differentiating the two would require that the prefix also reflect the
return value:</p>
<pre>   $array[0]       # first element (scalar) of @array
   *array[0]       # first element (ref) of @array
   $hash{key}      # &quot;key&quot;-indexed element (scalar) of %hash
   *hash{key}      # &quot;key&quot;-indexed element (ref) of %hash</pre>
<p>The structure of @array and %hash would remain identical, in that each
slot could only hold one element (so $array[0] and *array[0] would point
to the same data, only &quot;casting&quot; them differently). However, you'd have
to make sure that you called them with the correct prefix:</p>
<pre>   my @array;               # ok
   *array[0] = new Class;   # ok
   $array[0]-&gt;func();       # wrong
   *array[0]-&gt;func();       # ok
   $stuff = *array[0];      # wrong
   $stuff = $array[0];      # ok (but empty)</pre>
<p>This actually gives us a key advantage. Here, Perl can now return the
value if it's the correct type, or undef otherwise. As such:</p>
<pre>   $array[3] = &quot;Nate&quot;;      # ok
   print $array[3];         # ok
   *array[3]-&gt;func();       # *array[3] returns undef</pre>
<p>So, * is acting a little like an implicit ref call. This means that
we'll be guaranteed to be getting what we want without nasty ref checks
all over the place. It also means that our error messages can be
clearer:</p>
<pre>   array[3] is a scalar value, not a reference at line 3.</pre>
<p>Note that this approach is analagous from a user level to how hashes and
arrays work as well:</p>
<pre>   $array[3] = &quot;Nate&quot;;      # ok
   print $array[3];         # ok
   print $array{3};         # undef</pre>
<p>Just like [] and {} are used to get at different types of values, so too
are $ and * now used to get to different types of values.</p>
<a name='Ability to avoid returning objects'></a><h3>Ability to avoid returning objects</h3>
<p>RFC 49, &quot;Objects should have builtin stringifying STRING method&quot;, and
RFC 73, &quot;All Perl core functions should return objects&quot;, together
describe an interface which could allow easily embedded core objects
with true polymorphic capabilities.</p>
<p>For example, using these methods the date() function would not have to
choose what to return in a scalar context. Instead, an object would be
returned which could morph into a &quot;true&quot; scalar value as needed. This
approach is quite powerful in embedding objects at a low-level.</p>
<p>However, there are some problems. For one thing, sometimes you really
want an actual scalar:</p>
<pre>   $x = Math-&gt;calc;   # create $x object
   $z = $x * $y;      # cast via $x-&gt;NUMBER to scalar
   $m = $x;           # object reassignment (oops?)</pre>
<p>If $x is a polymorphic object, there are a couple of disadvantages:</p>
<pre>  1. There's a lot of overhead in maintaining objects.

  2. Lines 2 and 3 work completely differently, which
     is counterintuitive.

  3. You still can't tell what's an object and what's
     a scalar by looking at the code.</pre>
<p>By differentiating the types, however, we can explictly ask for what we
want to get back:</p>
<pre>   *x = Math-&gt;calc;   # same function overloaded to
   $x = Math-&gt;calc;   # return both
   $z = $x * $y;      # true scalar math
   *m = *x;           # true reference assignment</pre>
<p>Now it's obvious what's going on. We know what we're getting, we know
what we are passing around, and there's no wasted overhead for passing
objects around if we don't need to.</p>
<a name='Better cognitive properties'></a><h3>Better cognitive properties</h3>
<p>Perl prides itself on being a human-friendly language.  Indeed, it is.
And fun! Mostly.</p>
<p>However, trying to figure out what's a scalar and what's an object just
by looking at a whole bunch of $'s is not fun. And let's face it, most
people regard single values and references to be totally different
things. And they are, just like arrays and hashes are totally different.</p>
<p>Many Perl programmers I've taught are quite surprised the first (and
second, and third, and fourth) time they see something like this:</p>
<pre>   $cgi = new CGI || $oldcgi;
   $user = $cgi-&gt;param($name) || $ENV{REMOTE_USER};
   print &quot;Welcome $user, to &quot; . $cgi-&gt;server_name;</pre>
<p>This is confusing. Only $scalars can play such double-duty.  %hashes and
@arrays can only hold hashes and arrays. But Perl 5 $scalars can
currently hold values or references, giving Perl 5 OO a &quot;bolted on the
side&quot; appearance. [3]</p>
<p>While we have become accustomed to looking at it, the simple fact is
it's confusing. The same &quot;type&quot; is being used for things with different
qualities, and it's just as confusing as if arrays and hashes had the
same prefix.</p>
<p>Now, consider this code:</p>
<pre>   *cgi = new CGI || *oldcgi;
   $user = *cgi-&gt;param($name) || $ENV{REMOTE_USER};
   print &quot;Welcome $user, to &quot; . *cgi-&gt;server_name;</pre>
<p>This is less confusing. It is now apparent where our *objects are and
where our $scalars are. The code is more easily readable and can also be
more specific in what types it is requesting. You can instantly pick out
your *objects and $scalars at a glance.</p>
<a name='Increased namespace'></a><h3>Increased namespace</h3>
<p>This is a simple one, but should not be overlooked. With a separate *
type, we now have to ability to have both $name and *name, where $name
could be the actual value and *name could be a reference to $name.
Increases in namespace are basically always beneficial.</p>
<a name='Potential Problems'></a><h2>Potential Problems</h2>
<p>This approach is not without its problems. These need to be carefully
addressed before this can be implemented.</p>
<a name='Iterating through a mixed array'></a><h3>Iterating through a mixed array</h3>
<p>Consider an array with mixed references and scalars:</p>
<pre>   @mixed = qw($x $y *cgi $z *apache);</pre>
<p>A Perl 5 loop would do something like this (remember all the prefixes
will be $ unlike what's shown above):</p>
<pre>   for ( @mixed ) {
      if ( ref $_ ) {
         $name .= $_-&gt;func;
      } else {
         $sum += $_;
      }
   }</pre>
<p>Now consider the Perl 6 for loop over it, where the prefixes are mixed</p>
<pre>   for ( @mixed ) {
       if ( $_ ) {   # uh-oh
       # ...
   }</pre>
<p>The problem is that the builtin anonymous scalar $_ can now only handle
scalars, since we've separated the types. Luckily, we have already added
a new anonymous reference, *_. As such, our for loop can now obey the
following rules:</p>
<pre>   1. $_ and *_ are initialized to undef each loop

   2. only the appropriate one is set depending on
      whether the element is a scalar or reference</pre>
<p>So, our for loop becomes:</p>
<pre>   for ( @mixed ) {
      if ( *_ ) {
         $name .= *_-&gt;func;
      } else {
         $sum += $_;
      }
   }</pre>
<p>Note that the length of the code is exactly the same, and that the test
no longer requires a ref() call.  In this specific example, I would
argue this could be seen as a benefit, because you can use it to only
address certain elements of the array:</p>
<pre>   for ( @mixed ) {
      $sum += $_ if $_;    # only address true scalars
   }</pre>
<p>This will implicitly only get true scalars, since all the references
will be placed in *_ (and the loop ignores them).</p>
<p>However, the situation gets ugly when you want to name your loop
parameters:</p>
<pre>   for my $line ( @mixed ) {
      # uh-oh
   }</pre>
<p>Now, we've explicitly told the for loop to use $line instead of $_. But
this can only hold scalars, not references. There are two possibile
solutions:</p>
<pre>   1. Put the scalars in $line, but the references
      in *_ still (and vice versa if they had said
      &quot;my *line&quot;).

   2. Automatically convert the name of $line to
      *line if necessary.

   3. Do #1, but require them to use a HOFN if they
      want automatic conversion (i.e., &quot;^line&quot;)</pre>
<p>None of these is a perfect solution. This is an issue that needs further
examination. Note that an analagous problem exists with mixed hashes.</p>
<a name='Conclusions'></a><h3>Conclusions</h3>
<p>Differentiating scalars and references gives us many key advantages:</p>
<pre>   1. Ability to differentiate types in all contexts

   2. More compact, readable, and flexible syntax

   3. Leftmost $ tells true context

   4. Implicit type checking

   5. Better cognitive properties

   6. Increased namespace</pre>
<p>Initially, I started writing this RFC as an idea that should be at least
considered, but not necessarily one that I thought would really benefit
Perl 6.</p>
<p>However, I have since completely changed my mind. As I went through the
process, I found that there were many, many deficiencies with the
current &quot;stick everything in $&quot; philosophy of Perl 5. I believe that
this proposal could benefit Perl 6 tremendously.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>If this gets accepted, it will require a massive rewrite of Perl's
internals. Thank heavens we're doing that anyways. :-)</p>
<a name='NOTES'></a><h1>NOTES</h1>
<p>[1] The &quot;single&quot; part isn't true, either, since $arrayrefs and $hashrefs
in fact point to multiple values.  Some will counter with the idea that
$arrayrefs and $hashrefs actually point to &quot;single&quot; arrays and hashes.
However, this is a stretch in the author's opinion.</p>
<p>[2] Opponents might poke at this, claiming that you only know *
represents a reference &quot;of some type&quot;. What next, individual types for
all the different numbers, filehandles, and so forth? However, this is a
slippery slope fallacy.  Numbers and strings are types of scalars, just
like objects and filehandles are types of references.</p>
<p>[3] Andy Wardley came up with this one. :-)</p>
<p>[4] Note that this does not have to replace the syntax in RFC 21, but
can serve as an alternative form.</p>
<a name='MIGRATION'></a><h1>MIGRATION</h1>
<p>The p52p6 translator would have to convert all Perl 5 reference and
object notations into the new Perl 6 syntax. This is not actually as
hard as it sounds since a few regexp's would do the trick for
references. Objects are trickier and would require some backtracking
through the code.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>RFC 49: Objects should have builtin stringifying STRING method</p>
<p>RFC 73: All Perl core functions should return objects</p>
<p>RFC 9: Highlander Variable Types</p>
<p>RFC 109: Less line noise - let's get rid of @%</p>
<p>RFC 95: Object Classes</p>
<p>Upcoming RFC by brian d foy and myself on polymorphic objects</p>
<p>RFC 21: Replace <code>wantarray</code> with a generic <code>want</code> function</p>
<p>RFC 23: Higher order functions</p>
</div>
